324 research outputs found

    Learning to generate one-sentence biographies from Wikidata

    Full text link
    We investigate the generation of one-sentence Wikipedia biographies from facts derived from Wikidata slot-value pairs. We train a recurrent neural network sequence-to-sequence model with attention to select facts and generate textual summaries. Our model incorporates a novel secondary objective that helps ensure it generates sentences that contain the input facts. The model achieves a BLEU score of 41, improving significantly upon the vanilla sequence-to-sequence model and scoring roughly twice that of a simple template baseline. Human preference evaluation suggests the model is nearly as good as the Wikipedia reference. Manual analysis explores content selection, suggesting the model can trade the ability to infer knowledge against the risk of hallucinating incorrect information

    Interactions with conspecific outsiders as drivers of cognitive evolution

    Get PDF

    Author Profiling for English and Arabic Emails

    Get PDF
    This paper reports on some aspects of a research project aimed at automating the analysis of texts for the purpose of author profiling and identification. The Text Attribution Tool (TAT) was developed for the purpose of language-independent author profiling and has now been trained on two email corpora, English and Arabic. The complete analysis provides probabilities for the author’s basic demographic traits (gender, age, geographic origin, level of education and native language) as well as for five psychometric traits. The prototype system also provides a probability of a match with other texts, whether from known or unknown authors. A very important part of the project was the data collection and we give an overview of the collection process as well as a detailed description of the corpus of email data which was collected. We describe the overall TAT system and its components before outlining the ways in which the email data is processed and analysed. Because Arabic presents particular challenges for NLP, this paper also describes more specifically the text processing components developed to handle Arabic emails. Finally, we describe the Machine Learning setup used to produce classifiers for the different author traits and we present the experimental results, which are promising for most traits examined.The work presented in this paper was carried out while the authors were working at Appen Pty Ltd., Chatswood NSW 2067, Australi

    A robust operational model for predicting where tropical cyclone waves damage coral reefs

    Get PDF
    International audienceTropical cyclone (TC) waves can severely damage coral reefs. Models that predict where to find such damage (the 'damage zone') enable reef managers to: 1) target management responses after major TCs in near-real time to promote recovery at severely damaged sites; and 2) identify spatial patterns in historic TC exposure to explain habitat condition trajectories. For damage models to meet these needs, they must be valid for TCs of varying intensity, circulation size and duration. Here, we map damage zones for 46 TCs that crossed Australia's Great Barrier Reef from 1985–2015 using three models – including one we develop which extends the capability of the others. We ground truth model performance with field data of wave damage from seven TCs of varying characteristics. The model we develop (4MW) out-performed the other models at capturing all incidences of known damage. The next best performing model (AHF) both under-predicted and over-predicted damage for TCs of various types. 4MW and AHF produce strikingly different spatial and temporal patterns of damage potential when used to reconstruct past TCs from 1985–2015. The 4MW model greatly enhances both of the main capabilities TC damage models provide to managers, and is useful wherever TCs and coral reefs co-occur

    Author Profiling for English and Arabic Emails

    Get PDF
    This paper reports on some aspects of a research project aimed at automating the analysis of texts for the purpose of author profiling and identification. The Text Attribution Tool (TAT) was developed for the purpose of language-independent author profiling and has now been trained on two email corpora, English and Arabic. The complete analysis provides probabilities for the author’s basic demographic traits (gender, age, geographic origin, level of education and native language) as well as for five psychometric traits. The prototype system also provides a probability of a match with other texts, whether from known or unknown authors. A very important part of the project was the data collection and we give an overview of the collection process as well as a detailed description of the corpus of email data which was collected. We describe the overall TAT system and its components before outlining the ways in which the email data is processed and analysed. Because Arabic presents particular challenges for NLP, this paper also describes more specifically the text processing components developed to handle Arabic emails. Finally, we describe the Machine Learning setup used to produce classifiers for the different author traits and we present the experimental results, which are promising for most traits examined.The work presented in this paper was carried out while the authors were working at Appen Pty Ltd., Chatswood NSW 2067, Australi

    The use of singlebeam echo-sounder depth data to produce demersal fish distribution models that are comparable to models produced using multibeam echo-sounder depth

    Get PDF
    Seafloor characteristics can help in the prediction of fish distribution, which is required for fisheries and conservation management. Despite this, only 5%–10% of the world's seafloor has been mapped at high resolution, as it is a time-consuming and expensive process. Multibeam echo-sounders (MBES) can produce high-resolution bathymetry and a broad swath coverage of the seafloor, but require greater financial and technical resources for operation and data analysis than singlebeam echo-sounders (SBES). In contrast, SBES provide comparatively limited spatial coverage, as only a single measurement is made from directly under the vessel. Thus, producing a continuous map requires interpolation to fill gaps between transects. This study assesses the performance of demersal fish species distribution models by comparing those derived from interpolated SBES data with full-coverage MBES distribution models. A Random Forest classifier was used to model the distribution of Abalistes stellatus, Gymnocranius grandoculis, Lagocephalus sceleratus, Loxodon macrorhinus, Pristipomoides multidens, and Pristipomoides typus, with depth and depth derivatives (slope, aspect, standard deviation of depth, terrain ruggedness index, mean curvature, and topographic position index) as explanatory variables. The results indicated that distribution models for A. stellatus, G. grandoculis, L. sceleratus, and L. macrorhinus performed poorly for MBES and SBES data with area under the receiver operator curves (AUC) below 0.7. Consequently, the distribution of these species could not be predicted by seafloor characteristics produced from either echo-sounder type. Distribution models for P. multidens and P. typus performed well for MBES and the SBES data with an AUC above 0.8. Depth was the most important variable explaining the distribution of P. multidens and P. typus in both MBES and SBES models. While further research is needed, this study shows that in resource-limited scenarios, SBES can produce comparable results to MBES for use in demersal fish management and conservation

    The use of singlebeam echo-sounder depth data to produce demersal fish distribution models that are comparable to models produced using multibeam echo-sounder depth

    Get PDF
    Seafloor characteristics can help in the prediction of fish distribution, which is required for fisheries and conservation management. Despite this, only 5%–10% of the world\u27s seafloor has been mapped at high resolution, as it is a time-consuming and expensive process. Multibeam echo-sounders (MBES) can produce high-resolution bathymetry and a broad swath coverage of the seafloor, but require greater financial and technical resources for operation and data analysis than singlebeam echo-sounders (SBES). In contrast, SBES provide comparatively limited spatial coverage, as only a single measurement is made from directly under the vessel. Thus, producing a continuous map requires interpolation to fill gaps between transects. This study assesses the performance of demersal fish species distribution models by comparing those derived from interpolated SBES data with full-coverage MBES distribution models. A Random Forest classifier was used to model the distribution of Abalistes stellatus, Gymnocranius grandoculis, Lagocephalus sceleratus, Loxodon macrorhinus, Pristipomoides multidens, and Pristipomoides typus, with depth and depth derivatives (slope, aspect, standard deviation of depth, terrain ruggedness index, mean curvature, and topographic position index) as explanatory variables. The results indicated that distribution models for A. stellatus, G. grandoculis, L. sceleratus, and L. macrorhinus performed poorly for MBES and SBES data with area under the receiver operator curves (AUC) below 0.7. Consequently, the distribution of these species could not be predicted by seafloor characteristics produced from either echo-sounder type. Distribution models for P. multidens and P. typus performed well for MBES and the SBES data with an AUC above 0.8. Depth was the most important variable explaining the distribution of P. multidens and P. typus in both MBES and SBES models. While further research is needed, this study shows that in resource-limited scenarios, SBES can produce comparable results to MBES for use in demersal fish management and conservation

    Molecular characterization of the uncultivatable hemotropic bacterium Mycoplasma haemofelis

    Get PDF
    Mycoplasma haemofelis is a pathogenic feline hemoplasma. Despite its importance, little is known about its metabolic pathways or mechanism of pathogenicity due to it being uncultivatable. The recently sequenced M. haemofelis str. Langford 1 genome was analysed and compared to those of other available hemoplasma genomes
    • …
    corecore